Modelling the dynamic relationship between spread of infection and observed crowd movement patterns at large scale events

Understanding how contact patterns arise from crowd movement is crucial for assessing the spread of infection at mass gathering events. Here we study contact patterns from Wi-Fi mobility data of large sports and entertainment events in the Johan Cruijff ArenA stadium in Amsterdam. We show that crowd movement behaviour at mass gathering events is not homogeneous in time, but naturally consists of alternating periods of movement and rest. As a result, contact duration distributions are heavy-tailed, an observation which is not explained by models assuming that pedestrian contacts are analogous to collisions in the kinetic gas model. We investigate the effect of heavy-tailed contact duration patterns on the spread of infection using various random walk models. We show how different types of intermittent movement behaviour interact with a time-dependent infection probability. Our results point to the existence of a crossover point where increased contact duration presents a higher level of transmission risk than increasing the number of contacts. In addition, we show that different types of intermittent movement behaviour give rise to different mass-action kinetics, but also show that neither one of two mass-action mechanisms uniquely describes events.


Scientific Reports
| (2022) 12:14825 | https://doi.org/10.1038/s41598-022-19081-z www.nature.com/scientificreports/ random walk models, and investigate their impact on the spread of infection. In Section "Transmission over the network" we show the time evolution of several important measures throughout the event, and in Section "Discussion" we discuss the findings.

Data
We analyse movement data derived from Wi-Fi detections in the Johan Cruijff Arena stadium in Amsterdam. In previous work we have described various data and methods used to analyse the movement tracks obtained in this stadium, details of these can be found in our previous papers 25,26 . In those papers we analyse spatial and temporal aspects of the collective motion patterns. In this study we use the same methods as described in our most recent work 26 and we therefore refer the reader to that reference for a detailed description of the data processing and movement track reconstruction. Here, we provide a brief overview. The wireless network in the stadium consists of nearly 600 access points (APs) with known spatial coordinates. We estimate locations of anonymised smart phones using proximity detection 29 . At regular time intervals t = 10 s we determine the position of a device at one of the APs and generate temporal sequences of AP locations. Drawing a line between successive locations produces a trajectory, or movement track. To deal with the considerable amount of fluctuation due to noise, we use a simple moving average to smooth the movement tracks (see Fig. 1 for examples). We ignore the z-coordinate and simplify the analysis to two-dimensions.
We restrict our analyses to movement tracks that fulfil certain requirements. When someone is not using his/ her smartphone, the device eventually pauses all wireless communication. Therefore detection periods of devices alternate with periods without any detection. First we select devices whose detection periods span a minimum length of time, for which we use the duration of the main show of the events. For the football match the main show is the match, while for the dance event this was the DJ show. We select movement tracks with a minimum amount of detection gaps. We close the detection gaps using the interpolation method used by Rhee et al. 30 . We filter out devices that did not move at all during the whole event, as these likely represent static devices and not mobile phones.

Movement track analysis
Movement tracks of selected devices are characterised by an intermittent movement pattern. Periods of rest ('waiting times') alternate with episodes of movement that form larger displacements. This aspect of the movement data shows up clearly in the 1D projections of the movement tracks onto the x-and y-axes (see Fig. 1 for examples). The observation also agrees with our intuition about human behaviour. People stay in one place for some time, and then decide to change location, usually in one continuous movement bout. During the movement episodes individuals move with some degree of directional persistence. We have characterised these aspects in our previous works 25,26 .
The intermittent movement pattern is present in the movement data of both events. However, the reason behind the intermittent pattern is different in both events. For the football match, movement is mostly regulated and has a predictable character. We expect a typical movement trajectory that will remain mostly within a small, bounded region, and with larger displacements only at ingress, half time, and egress. For the dance event on the other hand, movement is based on individual decisions about when (and where) to move. We expect a pattern of multiple waiting times at different locations, and with variable duration. In Fig. 1 we show examples of both types of movement track from each event that agree with this intuition.
The accuracy of the movement tracks is too low for a meaningful spatial analysis of the contact patterns. However, the movement data demonstrate important properties of movement patterns in mass gathering events, namely intermittency and persistence. To study the exact effect of the movement patterns on the transmission of infection, we reproduce these properties in simulations (see Section "Random walk models"). In 25 we show that individuals move with directional persistence and superdiffusively up to a scale set by the size of the stadium. It is this aspect of the motion that we wish to reproduce in simulations with dimensionless parameters (i.e. that do not mimic the spatial properties of the stadium). In Section "Random walk models" we explain how we do this. To simulate the intermittent movement patterns of the dance event we need to correctly characterise the waiting times. We extract waiting times from the movement tracks using a method similar to Boyer et al. 31 . (Note that we have investigated waiting times more extensively in our previous work 26 . To make the current research selfcontained we repeat part of the analyses here.) We discretise the stadium into square cells of size 10 × 10 m, a size that roughly corresponds to the measurement error. Waiting times t are measured as the number of consecutive time intervals ( t i ) in the same grid cell. In Fig. 2 we show the distribution of waiting times on log-log scales. The data approximate a straight line on log-log scales, which indicates the waiting times are heavy-tailed. We fit statistical distributions to the measured waiting times, using maximum likelihood estimation (MLE) methods. We determine whether the data are better fit by an exponential distribution, a truncated power law, a log-normal, or stretched exponential distribution (see Appendix "Statistical methods and model selection" for details of the models and the MLE fitting). The exponential distribution is an indication of waiting times following a Poisson process. There have been a number of reports showing evidence that waiting times in the movement patterns of various animals (e.g. 31,32 ), and also humans 30 , follow power law distributions. In our case, the waiting times are limited by the duration of the event and can only be reasonably identified as a truncated power law. The stretched exponential and log-normal distributions are models commonly used to describe heavy-tailed phenomena in complex systems 33 . Model selection based on Akaike's information criterion (AIC) 51 indicates that the most appropriate model is the truncated power law (see Appendix "Statistical methods and model selection" for details). The MLE value of the exponent α = 1.89 (95% CI 1.8844-1.9028).

Figure 2.
Probability distribution function of the waiting times on log-log scale, together with the MLE fits of the exponential, truncated power law, log-normal, and stretched exponential distributions. Model selection shows that the power law with exponent α = 1.89 provides the best description (see Appendix "Statistical methods and model selection" for details).

Aggregated contact networks
We build contact networks by checking, at each time step, whether the pairwise proximity between devices i and j falls within a threshold distance (radius) r = 1.5 m. We build a network aggregated over the whole event, which implies that any two individuals who are within a distance < r in at least one time interval, have an edge between them in the network. In Fig. 3 we show a visualisation of the network of the football match Ajax-Feyenoord. The networks of both the football match and dance event are dense, and have small-world properties 34 . In Fig. 4 we show the empirical degree distributions, together with Poisson distributions where k is the average node degree. The degree distributions are not heavy-tailed but, particularly for the dance event, the variance is much larger than the variance of the Poisson prediction. The observation that the degree distributions are not heavy-tailed suggests that 'superspreaders' , interpreted here as individuals with relatively many different contacts, are possibly rarer than one would expect at such an event. However, this observation is based solely on the analysis of movement patterns and does not take into account other possible aspects giving rise to superspreading, such as heterogeneity in the infectiousness of individuals.
We compare the networks to random graphs of the same size N and average degree k . The graphs are Erdős-Rényi models G (N, p), in which each pair of nodes is connected with probability p = �k�/(N − 1) 13 . We see that p(k) ∼ e −�k� �k� k k!  www.nature.com/scientificreports/ the empirical networks show the small-world phenomenon L L random and C ≫ C random , especially for the football match (see Table 1). The aggregated networks highlight an important difference with the assumptions of the existing model for dense crowds 9 , which proposes that pedestrian contacts are analogous to the collision process in the kinetic gas model. According to this model, when density increases, movement becomes increasingly impeded, and contacts are restricted to nearby individuals. Thus, contacts are expected to be local, which gives rise to lattice-like contact networks, without small-world properties. Clearly, these assumptions are valid within a sufficiently short time frame during an event. The model breaks down at the time scale of the whole event. During the phases of an event when crowd density is not at peak conditions, pedestrians are free to make larger displacements without difficulty. In addition, even during high density conditions individuals may relocate to visit bars, toilets, et cetera. Therefore, any crowd movement model in a bounded space and over longer time periods will give rise to a contact network with small-world properties.
Another important property which does not directly emerge from the kinetic gas model, is revealed when we look at temporal aspects of the interactions. The longitudinal nature of the data allows us to record for each pair of individuals i and j, a sequence of time intervals in which interaction occurred. A list of such sequences constitutes a temporal network 17 . From the contact event list we extract two measures of the interaction dynamics, namely the distributions of (1) contact duration t ij between i and j, and (2) weights w ij , which represent the total contact time between i and j during the event. We measure pairwise contact duration t ij as continuous periods of physical proximity, where we simply merge and count the number of time intervals (in which physical proximity was detected) that are directly consecutive (zero gap length between them). In Fig. 5a we show the empirical probability distributions of contact duration P(�t ij ) of both the football match and dance event on log-log scales. We see that the data of both events have broad tails, which indicates that the contact duration is heterogeneously distributed. The two distributions are also very similar, despite differences in the underlying event types and corresponding crowd dynamics. In Fig. 5b we show the distribution of weights P(w ij ) on log-log scales. We see that distributions of both events are heavy-tailed, and for a part decay with similar slope values. We apply model selection to the empirical probability distributions of the dance event, using the same model set as in Section "Movement track analysis" (see Fig. 2). This shows that the truncated power law provides the best fit for both contact duration and weights. The MLEs of the power law exponents are α = 1.45 (95% CI 1.4473-1.4593) for the contact duration, and α = 1.63 (95% CI 1.6202-1.6298) for the weights. In 5a and b we show only the exponential and truncated power law fits (see Appendix "Statistical methods and model selection" for details of the model selection). The distribution of measured contact duration may depend on the chosen value of distance threshold r. In Fig. 5a and b we show the distributions of contact duration and weights of the dance event for a range of values from r = 2 to 10 m (grey lines). We see that the shape of the distributions is robust against variation in r. We note that the emergence of heavy-tailed contact duration distribution seems Table 1. Average path length L and cluster coefficient C of the two events, compared to random graphs of the same size N and average degree k . The values shown in the table are averages over 20 random realisations.  www.nature.com/scientificreports/ independent of the positioning accuracy of the underlying movement data. While we do not wish to propose a definite model for describing contact duration statistics in crowds, we expect heavy-tailed contact duration distributions to be the norm rather than the exception for mass gathering events. That the statistics of contact duration are robust across different settings has been observed before in social networks 15,35 , but it has not been demonstrated to characterise contact patterns in crowds. In the next section we propose an explanation for the heterogeneity in contact duration in terms of simple random walk models.

Random walk models
In this section we investigate the effect of the empirically observed movement patterns on disease spreading dynamics. To do so, we reproduce the observed characteristics of the movement patterns in random walk (RW) models, and demonstrate the effect on infection transmission in simulations. The specific characteristics we reproduce are persistence and intermittency. To study their effects we first add different forms of intermittency to simple random walks, and next we create random walks which show both intermittency and persistence. We first illustrate the effect of these characteristics on the aggregated, weighted networks, and see whether we can reproduce the temporal features (as shown in Fig. 5). We run simulations of systems of random walkers and derive weighted contact networks as explained in Section "Aggregated contact networks". We consider a system of N = 1000 individuals for T = 500 time steps. We use a threshold radius r = 1 to define interaction by physical proximity. The individuals start at random positions in a 100 × 100 area, measured in units of the radius r. At each time step the individuals make a step of length 2, in a random direction taken from the uniform distribution on the interval [−π , π] . Note that these settings are similar to the simulations in 22 . All modifications we make to these settings (below), are parameterised to fit in with this framework.
To study the effect of intermittent movement behaviour, we divide each random walk into alternating periods of movement and rest. For the simulation of the regulated intermittency of the football match, we simply insert two collective breaks in the simulation, each lasting 100 time steps and starting at t 1 = 50 and t 2 = 250 . To simulate the spontaneous, individual intermittency of the dance event, we take a stochastic approach. We divide each simulation time step in two parts. First, an individual draws a waiting time from a power law distribution φ(t) = t −α , where α = 1.9 , and t 5 . The parameter value α = 1.9 is as observed in the empirical data. The random walker remains at its current position for a number of n = ⌈t⌉ steps. Next, an individual draws a 'flight time' from a stretched exponential distribution The flight times following the stretched exponential with stretching parameter β = 0.86 are as empirically observed (see 25 ). The parameter controls the mean flight times. Larger flight times make the CTRW converge to the normal RW within the finite simulation time T. We show results for a range of values (see below). The random walker then takes a number of n = ⌈τ ⌉ consecutive step lengths similar to the non-interrupted model described above. As these models are variations on the continuous-time random walk framework we refer to the regulated one as regCTRW, and the stochastic model as CTRW 36 .
In Fig. 6 we show the resulting weight distributions of the simple RW, regCTRW, and CTRW. The simulated regCTRW weight distribution (green) shows a pronounced bimodal distribution (driven by the perfectly regulated collective breaks). We see that the weights of both the RW (blue) and regCTRW decay exponentially, but that the regCTRW distribution has a second peak at 100 time steps, which corresponds to the length of one break. A small number of individuals must have been in close proximity during both breaks, as there are observations at 200 time steps. The empirical weight distribution of the football match (Fig. 5b) shows a less pronounced www.nature.com/scientificreports/ bimodal distribution, but there is a clear peak in contact duration distribution (Fig. 5a) associated with the length of the match. The weights of the stochastic CTRW model (Fig. 6, red) follow a heavy-tailed distribution. We show weights of the CTRW for a range of values for = 0.1, 0.05, 0.033, 0.02 . We show the MLE fit of an exponential distribution to the simple RW, and a power law p(w) ∼ w −α to the weights of the CTRW with flight times using = 0.033 . The MLE of the power law exponent α = 2.55 . The resulting weight distribution of the CTRW resembles the empirically observed weight distributions as both are heavy-tailed distributions that can be described by a power law over a certain range (Fig. 5b). The exact forms of the distributions and relationship with the underlying movement models are out of scope for this paper.
Next we investigate the effect of the empirical movement patterns on the spread of infection. To do so, we need to reproduce both characteristics of the observed motion patterns, persistence and intermittency, in random walk simulations. In our previous work 25 we have shown that during the movement episodes individuals do not diffuse randomly, but move with some degree of directional persistence and superdiffusively up to a scale set by the size of the stadium. Although these characteristics are Lévy-like, we did not find evidence of pure Lévy walk behaviour. In random walk models, persistence in direction is expressed through autocorrelation in turning angles between successive movement steps. This behaviour can be modelled using a correlated random walk (CRW) 38,39 . Angles between successive movement steps are sampled from the Von Mises distribution 40 . Again, we compare three RWs, where the first is the (continuously moving) CRW, the second is the CRW interrupted by collective breaks, and the third is a stochastic intermittent CRW. The waiting times are added to the models according to the mechanisms described above. As these models are now continuous-time correlated random walks we refer to them as regCTCRW, and CTCRW.
We compare the infection spreading dynamics of the three models (CRW, regCTCRW, and CTCRW). We focus on a basic susceptible-exposed (SE) model, which we run on top of the simulated random walks. Due to the short time span of the events we're simulating, we ignore transitions E → I , and recovery ( I → R ). We divide the population in the three states susceptible (S), exposed (E), and infectious (I). At t = 0 we introduce a number of infectious individuals I 0 (seeds), which remains constant throughout the simulation. If a susceptible agent interacts with an infectious agent, there is a probability p inf that the susceptible agent transfers to the exposed state. In Fig. 7 we compare the time evolution of the number of exposed E(t) of the CRW, with both (a) the regCTCRW, and (b) CTCRW, using I 0 = 1 and p inf = 1 . Not surprisingly, the spreading stops completely during the breaks for the regCTCRW (Fig. 7a), and is slowed down for the CTCRW (Fig. 7b), due to agents effectively moving less, and having less encounters with other agents.
In reality, infection transmission requires prolonged contact between two individuals, instead of occurring instantaneously. We assume that the number of infectious units an individual is exposed to increases with contact duration, and that the risk of infection increases accordingly. To implement this, we introduce an infection probability as a function of contact duration, defined as where t ij is the cumulative contact duration between individuals i and j, which is measured as the number of time steps spent in proximity, and µ is a parameter modulating the increase of p inf with t ij . Equation (1) is based on a model proposed by Riley et al. 41 , and further developed in a number of publications (e.g. see [42][43][44][45]. Note that the time steps spent in close contact are not necessarily consecutive. Because this model produces lower numbers of exposed agents, we set I 0 = 10 to enhance the effect within the simulation time. In Figure 7 we show the time evolution of the number of exposed E(t) of the CRW and (a) the regCTCRW, and (b) the CTCRW, using µ = 0.04 in Eq. (1). The parameter value µ = 0.04 is arbitrarily chosen and serves illustrative purposes of the simulations. Below we explore further some of the implications of this parameter. We see that in this case (i.e. p inf ∼ t ij ), the number of exposed agents on the regCTCRW model (red dashed curve) jumps up at the start of each break (shaded grey areas), due to individuals having prolonged contacts with other agents. However, we also see that the curve rapidly saturates, due to exhaustion of the number of susceptibles near an infectious individual. Because of this, the spreading on the regCTCRW ends up producing lower numbers of exposed than the CRW. This shows that a stationary (e.g. seated) crowd imposes an increased infection risk if the infection probability is time-dependent, but that the total number of exposed individuals is bounded by the number of individuals in proximity. www.nature.com/scientificreports/ In Fig. 7b we see that the CTCRW model produces higher numbers of exposed agents than the CRW. The infection spreading again benefits from agents occasionally not moving, and having prolonged contacts with other agents. However, due the random, non-overlapping periods of rest there are no saturation effects. This result points to the fact that, if the infection probability is time-dependent (i.e. p inf ∼ t ij ), there exists a crossover point where prolonged contacts present more risk than encountering new individuals. If the transmissibility [i.e. µ in Eq. (1)] becomes lower/higher disease transmission benefits more/less from longer contact duration. In Fig. 7c we show the dependency of the final number of exposed E(t) at T = 500 , produced by both the CRW and CTCRW, on the value of µ [in Eq. (1)]. We see a clear crossover where the two curves intersect and one model poses a higher level of transmission risk than the other. This shows that, if the infection probability is time-dependent, an intermittently moving but freely mixing crowd may present the highest level of transmission risk.
These results also show that the actual transmission risk imposed by different types of crowd movement (i.e. stationary/dynamic) depends on the infection probability. As real crowded events are expected to consist of mixtures of different crowd movement behaviours, estimating the risk involved is not straightforward.

Transmission over the network
So far, we have studied the effect of two types of intermittent crowd movement (regulated and stochastic) on the spread of infection, in simulations with constant population density. Real life events occur in a number of distinct phases such as ingress, dwell times, and egress. During these phases the crowd density changes. In this section we show how the two types of intermittent crowd movement relate to crowd density and give rise to different mass-action kinetics. The data give information about how contact patterns are formed over time. More specifically, it gives information on when contacts occur in time. The longitudinal nature of the data set allows us to see when, and for how long, these contacts occur. Instead of aggregating the contacts and representing it as a static network (as in Section "Aggregated contact networks"), we look at the network in each separate time interval. We can refer to these networks as 'time slices' . The number of edges in each time slice represents the number of contacts in the time interval. In Fig. 8 we show the time evolution of the number of contacts. Note that this is similar to 'collisions' within the gas kinetic framework (Cf. 9 ). Tracking the contact rate over time enables us to study its dependence on the total crowd size, i.e. its effective mass-action kinetics. The contact curves can be seen to have a similar shape as the crowd size (N) curves (red lines in Fig. 8). We can visualise this aspect better by normalising the contacts using crowd size N(t), see Fig. 9. The contact curves can now both be seen as fluctuating around a constant value. This suggests contact rates increase linearly with crowd size or density. However, we also see that, especially in the case of the dance event, there is considerable fluctuation. This suggests that additional factors are required to explain variations in the contact patterns. We return to this issue at the end of this section.
To explore transmission of infection based on the empirical contact patterns, we consider an SEIR model which we run on top of the measured contact networks. We divide the population in the three states susceptible (S), exposed (E), and infectious (I). Due to the short time span of the events, we ignore transitions E → I , and recovery ( I → R ). Therefore, secondary infections (i.e. individuals passing on the disease after being exposed) are not included. We randomly select an individual which we introduce as an infectious seed, while all other individuals are susceptible. We assume transmission takes place at first encounter (i.e p inf = 1 ), and when there is contact between the seed and a susceptible, the last one changes into the exposed (E) state. We record this event, and track the growth of the number of exposed individuals from one particular seed over time, which is the cumulative incidence curve. We run this process for each individual in the sample. In Fig. 10 we show for www.nature.com/scientificreports/ both events all individual incidence curves (grey), and also the means (blue and green). Note that the cumulative incidence curves are simply the growth of the node degrees k over time, and the mean is just the time evolution of the mean degree k . We are especially interested in possible variation in the growth rate of the incidence curves during the event. This is better visualised by recording the number of infectious contacts per time interval. In Fig. 11 we show incidence rate curves (dE/�t) , where the brackets ... denote averaging over all possible individuals in the sample. Note that the incidence curves in Fig. 10 are simply the integral over this. The incidence rate curves represent newly exposed individuals per time interval, where we assume disease transmission takes place at first encounter between an infectious and a susceptible individual (i.e p inf = 1 ). We see that both curves in Fig. 11 are fluctuating. For the football match the curve can be seen to fluctuate around a constant value, but it is slowly declining and then has large peak after the match. Note that this particular shape has no obvious relationship with density (see crowd sample curve in Fig. 8), and if at all, seems negatively correlated. For the dance event the curve has a more pronounced shape, with the largest peak occurring just before the start of the DJ show, and a second peak right after the end of the show.
It is interesting to evaluate Figs. 8, 9, 10 and 11 in relation to the two different approaches to mass-action assumptions of homogeneous mixing 9,20,46 . According to the density-dependent mechanism, the rate of increase of exposed individuals follows (within a SEIR framework) dE/dt = βSI , where β is the transmission rate parameter. According to the frequency-dependent mechanism, the relation is dE/dt = βSI/N . For the density-dependent mechanism to hold, the spatial area must be constant so that an increase in population size is an increase  www.nature.com/scientificreports/ in density. The frequency-dependent mechanism is not only more in agreement with our natural intuition, but also more supported by empirical evidence 20,47 . Seen from this perspective, the results of the football match and the dance event paint two different pictures. Note that in the density-dependent regime the exposed rate curve should scale in a linear fashion with population density, while in the frequency-dependent regime the exposed rate curve should assume a constant value, independent of the crowd density. In Fig. 8 we see that contact patterns (collision frequency) evolve similarly as the crowd size curves (red dashed lines). This suggests contact rates increase linearly with crowd size, and both events are in the density-dependent regime. Figure 9 shows that this aspect is stronger for the football match, while the dance event shows more deviation from this rule. This distinction between the events is more pronounced in Fig. 11. In Fig. 11a we see that for the football match, dE/dt, as approximated by (dE/�t) , seems to fluctuate around a constant value, which indicates the transmission rate is in agreement with the frequency-dependent mechanism (i.e. ∼ βSI/N ), and independent of population size N or density. On the other hand, Fig. 11b shows that, for the dance event, dE/dt still closely resembles the crowd size curve N(t) (red dashed line in Fig. 8b). This suggests that at least for some part of the event, the transmission rate increases linearly with population size (i.e. ∼ βSI ), as formulated by the density-dependent mass-action law. These observations agree with our intuition as the area of the stadium is constant and an increase in crowd size means an increase in density. However, this effect is much less important for the football match, where the amount of mixing is constrained. Apart from entry and exit, people remain seated with a fixed number of individuals in close proximity, independent of the total crowd size. It would therefore be natural that contact patterns in the regulated crowd movements of the football match follow the frequency-dependent regime, while the contact patterns from the freely moving dance event visitors follow the density-dependent mechanism.
Nevertheless, in Figs. 8, 9, 10 and 11 we also see considerable deviation from both mass-action mechanisms. We see that, even at constant N (e.g. during the DJ show), contact rates and (dE/�t) vary. For both events we see a slow decline of (dE/�t) which might be explained by the exhaustion of susceptible individuals in someone's proximity. However, in both events we see a large peak at the end of the event, despite a decrease in crowd size N at that point in time. This suggests that the transmission rate β is not constant and varies over time, independent of S, I, or N. The transmission rate β is the product of infection probability p and contact rate C, i.e. β = p C 47 . In the simulation analysis (i.e. Fig. 11) p = 1 is constant, so it is the contact rate C which is timevarying. These observations support the idea that neither the density-dependent mechanism or the frequencydependent mechanism uniquely describes the scaling of contact rates in relation to crowd density, and that in many situations the contact rate is a combination of the two extremes 9,20 . However, the results also suggest that, in addition to the two types of mass-action mechanisms, additional factors are required to explain variations in the contact patterns, and that these factors are related to changes in crowd movement behaviour during an event.

Discussion
In this study we present several important features of the transmission of pathogens at large mass gathering events. These features, brought to light by enlarging the observational time frame, are illustrated by the temporally weighted contact networks, aggregated over the course of the event. First of all, the observed small-world property of the contact networks is not explained by the conceptual framework of existing contact models. According to this framework, contacts are restricted to nearby individuals, which gives rise to lattice-like contact networks, without small-world properties. This assumption is applicable only to a sufficiently short time frame. The fact that the contact networks fall somewhere between a lattice and a complete (fully connected) network indicates that pedestrian movements at events naturally consist of a mixture of local and long-distance displacements. This www.nature.com/scientificreports/ shows that movement behaviour during large events is not homogeneous in time, and key movement parameters are time-varying. Secondly, the aggregated contact networks have heavy-tailed contact duration distributions. We show that this observation is independent of the positioning accuracy of the underlying Wi-Fi-based movement data. Using simple random walk models we demonstrate the link between intermittent movement behaviour, consisting of alternating periods of movement and rest, and heavy-tailed contact duration distributions. Heterogeneously distributed contact duration is common to social networks but so far has not been demonstrated for crowds. We note that intermittent movement behaviour, and corresponding heterogeneously distributed contact duration, are expected to be the norm rather than the exception for mass gathering events. Using simulation, we have shown how different types of intermittent movement behaviour interact with key infection processes, indicating that the temporal dynamics of the contact patterns strongly influence transmission dynamics at mass gatherings. Our results expose the existence of a crossover point where increased contact duration presents a higher level of transmission risk than increasing the number of contacts. Therefore, understanding the temporal dynamics of the underlying crowd movements as well as specific disease transmission characteristics are both essential for estimating transmission rates at mass gathering events.
To isolate the effect of movement patterns on pathogen transmission, we have made several simplifying assumptions, such as conditions being constant throughout the stadium, and only considering transmission during close contact or in physical proximity. In addition we have assumed key epidemiological parameters, such as infection probability and susceptibility, to be equal for all individuals. Both of these are certainly simplifications. Investigating the impact of heterogeneity in these factors on the spreading dynamics at crowded events are important opportunities for further research.
The longitudinal nature of the data set has allowed us to study how contact rates and incidence evolve in relation to crowd size. We have shown that different types of events give rise to one of two types of mass-action kinetics. We have also shown that contact rates and spreading on the empirical movements are time-varying. This suggests that contact rates are probably the result of a dynamic combination of the two mass-action mechanisms. During an event one of the two mechanisms may become more dominant due to changes in collective movement behaviour. However, even within one of the two mass-action frameworks, parameters are not necessarily constant, giving rise to even more variation in contact rates. For example, individual movement parameters such as velocity, step length and persistence may change over time, possibly in response to external factors. These results indicate that a detailed understanding of pedestrian movement behaviour in dense crowds is crucial for predicting the specific mass-action process that will emerge.
Our work shows the importance of a more integrative approach to mass gathering events, which considers these events in their full spatio-temporal complexity. Here we have taken a first step and show how different event scenarios, with different intermittent crowd behaviours, lead to different contact rates. A more holistic approach to events is essential if we want models to inform policy makers on specific types of events and the risks they present. It will also increase our understanding of the different phases during mass gathering events and the varying levels of risk they present in relation to the transmission characteristics of specific pathogens. This could ultimately serve event organisers in planning a safe return to normal operations as well as developing long-term risk protocols for large mass gatherings.

Data availability
The analysis of Wi-Fi detection data in the current study is licensed under a contractual agreement between the University of Amsterdam and the Johan Cruijff Innovation Amsterdam. Derived movement tracks, and all custom code used to analyse the data in this study, are freely available at: https:// github. com/ phili prn/ Crowd-Epide miolo gy.

Appendix: Statistical methods and model selection
We use statistical methods of Clauset et al. 48 and Edwards et al. 49 for fitting the distributions. We also check results using the Python powerlaw package 37 , which has implemented the methods from 48 .

Models.
We fit the following statistical distributions to various measured quantities: • The exponential distribution, with probability density function defined as: • The truncated power law distribution distribution, with probability density function defined as: • The stretched exponential distribution, with probability density function defined as: • The log-normal distribution, with probability density function defined as: where t c is the exponential cutoff value, and a is the lower bound of the fitting range.
(2) where n is the number of data points, and a is the lower bound of the fitting range. In this research a is loosely determined as the value after which the decay starts in the empirical frequency distribution. There are no analytical solutions for the MLEs of the truncated power law distribution, stretched exponential, and log-normal distributions. In this case we numerically minimise the negative log-likelihood function For the numerical minimisation we use Python library functions (following 37 ).

Confidence intervals of estimated parameters.
To compute confidence intervals we use the likelihood profile method 50 . The method compares the likelihood of the MLE of a parameter θ with other values of that parameter. According to statistical theory has a chi-square distribution with one degree of freedom. We can find 95% confidence interval boundaries by using the fact that Pr{χ 2 3.84} = 0.95 , and numerically solving L(θ) − L(θ mle ) = 1.92 . For models with two parameters we perform the test for each parameter separately. We systematically vary the parameter of interest, and at each instance compute the value for the other parameter that maximises the likelihood at that point.

Akaike model selection.
To compute Akaike weights we need the Akaike Information Criterion (AIC) for which we require the value of the negative log-likelihood function at the maximum (MLE), and where K is the number of parameters to be estimated 49,51 . The AIC differences are where AIC min is the AIC of the model with the minimum AIC, which is considered as the best model. The Akaike weights are give by where M is the set of models to be compared.

Model selection results.
Waiting times In Section "Movement track analysis" we fit the statistical models given in Appendix "Models" to the measured waiting times (see Fig. 2). In Table 2 we show model selection results based on Akaike weights, using a = 6 t ( = 60 s). The truncated power law ( w tpl = 1 ) provides the best description of the data.
Contact duration In Section "Aggregated contact networks" we fit the statistical models given in Appendix "Models" to the measured contact duration distribution (see Fig. 5a). In Table 3 we show model selection results based on Akaike weights, using a = 2 t ( = 20 s). The truncated power law ( w tpl = 1 ) provides the best description of the data.   Table 2. Overview of the model selection results for the measured waiting times (see Fig. 2), based on Akaike weights.  Table 3. Overview of the model selection results for the measured contact duration (see Fig. 5a), based on Akaike weights.